Apparatus and method for image recognition

ABSTRACT

An apparatus, method and computer program for recognizing one or more images within digitized image data that might include a target image desired to be located. The system generates a set of domain blocks from the image data where each domain block represents a discrete portion the image data and a set of range blocks from one or more target images. Either the domain blocks, the range blocks, or both, are transformed by one or more substantially affine transformations with predetermined coefficients to create possible variants of the images. A comparison between the blocks is made to determine similarity, and includes at least a measurement of whether better matching is achieved when a range block is chosen from image data representing the image which is the source of the domain block or when chosen from other image data.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention generally relates to-computer systems for imageprocessing. More particularly, the present invention relates toapparatuses and methods to automatically search image data to detect thepresence, or likelihood of presence, of specific target images.

2. Description of the Related Art

There are several systems that analyze data to detect the presence ofspecific images in data. Analog images can occur in a variety of mediaincluding photographs, photographic slides, television images in variousformats including HDTV, images on computer monitors, holograms, x-rays,sonograms and radar. The analog imagery can be digitized and thusrepresented by an array of pixels where each pixel may be determined byone or more spectral bands of digital data. As examples, a digital imagecan be a 1024×1024 array of pixels where each pixel is specified by a 0or a 1 denoting black or white respectively, or an integer between 0 and255 denoting 256 shades of gray, or 3 integers between 0 and 255denoting a red, blue and green component, respectively with 256 levelsfor each component, or an integer between 0 and 1023 denoting 1024infra-red levels, or 256 integers each between 0 and 255 and eachdenoting the output at a different spectral level from a multi-spectralsatellite sensor. Digital imagery can also be created directly throughcomputer graphics software and systems as well as through imagingsensors whose output is natively digital.

There are many existing applications that utilize some form of imagerecognition within a set of data, such as fingerprint identification,quality control for manufacturing, optical character recognition onscanned documents, automatic target recognition for weapons systems, andface recognition. A critical function of these systems is to indicatethe likelihood of a match when a target image is indicated by the imagedata. And in circumstances where the image data is voluminous, such assatellite data or all images available on the Internet, human review forall of the data is impractical.

The prior art image recognition methods generally fall into twocategories: constrained and unconstrained. Constrained methods of imagerecognition work only with images that have a specific structure. Forexample, one commercial application that performs face recognition basedon biometrics requires the calculation of distances between eyes, andbetween nose and mouth, and may rely on other like relationalcomputations. Such a face recognition calculation normally does not makesense on a general type of image, such as a boat or airplane, since theunderlying structure of an image of boat or plane is not the same asthat of a face. More generally, constrained image recognition methodsare typically not effective on imagery without the specific structureassumed by the method since the sought image may have very differentrelational aspects than the target image, yet still satisfy the searchcriteria.

Constrained methods, however, typically require relatively high qualityimagery such that the underlying structure of the images that areanalyzed by the recognition method can be observed and this would not besuitable for use in poor quality image data. For example, a drugenforcement agent may take a photograph of an individual in a moving carand obtain a blurry image, or may take a telephoto picture of a subjectand later want to identify other people in the background of the photowho may be severely out of focus. Another problem occurs in videoimagery of a crowd scene, such as at a football stadium, where very tinyimages of large numbers of people are present. Once the imagery is toosmall, or too out of focus, or too blurred, the calculations required inthe constrained biometric face recognition system cannot be performedaccurately. More generally, once the quality of imagery is too poor, anyconstrained system cannot take advantage of its underlying structuralassumptions and then fails to perform accurately.

On the other hand, unconstrained image recognition systems do notutilize the underlying structure of an image as a basis for comparison,and thus do not suffer the same problems in reviewing low quality imagedata. Prior art unconstrained image recognition systems however do notwork well with complex imagery. For example, given two images A and B,specified by square pixel arrays of data, where each pixel is specifiedby an integer between 0 and 255, one measure of the difference, ∥A-B∥,between A and B is the square root of the sum of the square ofdifference between the corresponding pixel values. In such manner, A andB would be said to be similar when ∥A-B∥ is small. However, thismeasurement may be very large, even when the difference between A and Bis barely noticeable For example, create an image B from an image Awhere all columns of pixels of image A are shifted to the right by onepixel, and a new column of pixels is added at the left which justduplicates the new second column. If A is a 1024×1024 image, A and Bwill look very similar to a human observer. For example, if A is apicture of a boat, then B will also look like a boat, slightly shiftedto the right. However, ∥A-B∥ may be large since every pixel in A will bedifferent from the corresponding pixel in B. In some practical imagerecognition applications, it is desirable to recognize whether one ormore objects in image A are also present, or absent, from image B, notonly when they are slightly modified digitally as in the above example,but when they are acquired under very different conditions. Examples ofsuch disparate acquisition conditions include different imageacquisition times leading to different conditions of use, with differentcameras or other sensors, from different distances from sensor toobject(s) or scenes, at different perspectives, under different lightingconditions, different environmental conditions, and in the context ofdifferent backgrounds and in the presence or absence of other objects.Moreover, the target object(s) of interest may be partially obscured ina different manner in A and B, and may be rotated, scaled and translatedrelative to each other or to the background. Additionally, imagingsystems and computer software may further distort the imagery, such asthrough the application of compression to facilitate storage andtransmission and such as through insertion of special effects. Prior artunconstrained image recognition methods do not deal effectively with thecomplexity arising from significantly different conditions of image dataacquisition.

Accordingly, it is desirable to have an improved system for imagerecognition that can adequately search realistic imagery data for thepresence of target image(s) and/or image object(s) and successfullyindicate the likelihood of target image(s) and/or image object(s) beingpresent, even when the searchable imagery data and target imagery havesignificantly different conditions of acquisition. Such system should beunconstrained by the target image or searchable image data structure andallow for variation in the appearance of the target image within imagedata. It is to the provision of such an apparatus and method forrecognizing images within digitized data that the present invention isprimarily directed.

SUMMARY OF THE INVENTION

The present invention is an apparatus, method, and computer program thatcan recognize specific images within a collection of digitized imagedata, or at least indicate the likelihood that a specific image iscontained within the image data. In the system, a processor can eitherreceive image data in a digital format or itself digitize data intotarget images, the collection of which forms the searchable image data.In the system, a processor also can either receive other image data in adigital format, or itself digitize data into query images. The systemthen generates a set of domain blocks from one or more query images witheach domain block representing a discrete portion the query image data.A set of range blocks is then generated from the query image(s) and apredetermined one or more target images that are desired located withinthe searchable image data, with the range blocks corresponding todiscrete portions of the one or more queries and the target images fromthe searchable image data. To get additional potential appearances ofthe images, the range blocks are transformed by one or moresubstantially affine transformations with predetermined coefficients,such as an affine transformation which is one composed of translation,scaling and rotation operations in the spatial and spectral data. Asubstantially affine transformation is one which can be continuouslyapproximated locally by affine transformations.

The system uses a predetermined method of comparing image regionsconsisting of predetermined configurations of pixels, such as the squareroot of the sum of the squares of the corresponding pixel values whenthe image regions consist of identically sized rectangles. Each domainblock is then compared with one or more of the range blocks, and whilecomparing, generating classification data based upon a comparison of thedomain block with such range blocks, the classification data includingthe comparison result, geometric information relating to the locationsand descriptions of the range blocks, specifically including whether therange block originated from the query image or from the collection ofsearchable imagery, and the description of the substantially affinetransformation, if any, which was applied to create the range blockdata. A determination of the likelihood of at least a specific portionof one or more query images being similar to specific portions of thesearchable image data can be made based upon the classification dataaggregated over the collection of domain blocks, using at least ameasure of the extent to which domain blocks compare less closely torange blocks chosen from the query image(s) than to range blocks chosenfrom specific portions of the searchable image data.

The present invention attempts to accurately classify images that arelikely to contain similar target images rather than specifically seek anexact match. In classification, the goal is not to exactly identify atarget image but rather to categorize the image data (or the specificportions of the image data such as domain blocks) as to their likelihoodof containing the target image. The present invention accordinglyapplies to classification as well as identification.

The one or more target images can be selected from the searchable imagedata itself, and in such manner, other like objects can be locatedwithin the object data. Further, the image data can be preprocessed in apredetermined manner after receipt thereof, such as throughsubstantially affine transformations, scaling the image data to apre-determined size, segmenting the image data, purposely blurring oraltering the image data, or marking certain image areas to be ignoredduring the comparison of each domain block with one or more rangeblocks. And the steps of image recognition can be iterated to furtherreview either specific or all of the image data based upon theclassification data.

The substantially affine transformations create different views andappearances in the searchable image data or in the query imagery or inboth, allowing the system to indicate a high. likelihood of similarityeven when the query imagery is acquired under different conditions fromthe searchable image data. Additionally, the system can indicate a highlikelihood of similarity even when the query imagery represents partialviews of objects in the searchable image data, or partial views of thesearchable image data, since the comparison data can be high when thosepartial views are represented in the searchable image data.

The classification data can be chosen so that maximum similarity isindicated only when each domain block of the target image is asubstantially affine transformation of at least one range block. In manycases of real image databases, this theoretical condition of maximumsimilarity is enough to imply that maximum similarity is indicated onlywhen the two images are identical. And in one embodiment, the likelihoodof similarity is determined by using a function of two variables withvalues between 0 and 1 and the first variable is a specific portion of aquery image and the second variable is a specific portion of the imagedata.

A correlation of image data among different target images and/or queryimages can also be utilized along with the classification data toincrease the accuracy of the invention. For example, one or more of thetarget images can be very similar images of the same object such assuccessive frames of video. In such a case, the extent to which theclassification data of such similar images is itself correlated may beincluded in the classification data of one or more of the similarimages. By aggregating classification data from different target images,the nature of target images can be extended to include sets of targetimages, such as video sequences, or other sets of images related in somemanner. Similarly, the nature of query images can be extended to includesets of query images and the classification can be extended to includedclassification data of such extended sets of target and query images.The invention can also be utilized whether the target and query imagesare individual images or more general collections of correlated imagedata. This correlation aspect of the invention can be applied toinstances of searchable image data even when there is no a prioriknowledge of the similarity of images within such data through selectingsome or all such data to be query images and using the classificationdata thereby generated to determine the similarity of images within suchdata, and thereafter utilize the approach of correlation. Thecorrelation of image data within one or more target and/or query imagescan also be utilized along with the classification data to increase theaccuracy of the invention and reduce the overhead and cost of utilizingthe invention.

The present invention therefore provides an improved unconstrainedsystem of image recognition that searches realistic, imperfect imagerydata for the presence of target images and successfully indicates atleast the likelihood of one or more similar target images being present.Through the use of substantially affine transformation, segmentation,and other data manipulation, variations of target images can be locatedwithin the image data even though the target image has a significantlydifferent visual relationship or appearance within the image data

Other objects, features, and advantages of the present invention willbecome apparent after review of the hereinafter set forth BriefDescription of the Drawings, Detailed Description of the Invention, andthe Claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a perspective view of an apparatus for recognizing imagesutilizing a computer and an image capture source acquiring a Query (Q)image for review to determine the presence of a Target (T) image.

FIG. 2 is a diagram illustrating the transformation of the Query (Q)image into a set of domain blocks.

FIG. 3 is an illustration of the transformation of a Target (T) imagethrough a series of affine transformations into a set of range blocks.

FIG. 4 is an illustration of the transformation of a Target (T) imagethrough segmenting the target image into a set of range blocks.

FIG. 5 is an illustration of the transformation of a Target (T) imagethrough blurring and color alteration into a range block.

FIG. 6 is a diagram illustrating the searching of a specific block inthe domain set for the similarity to a range set of blocks generated inFIGS. 3 and 4.

FIG. 7 is a flowchart of a basic embodiment of the process of imagerecognition of one or more target images within digitized image data.

FIG. 8A is a flowchart of an enhanced embodiment of the process of imagerecognition of one or more target images within digitized image data.

FIG. 8B is a continuation of the flowchart of FIG. 8A.

DETAILED DESCRIPTION OF THE INVENTION

With reference to the figures in which like numerals represent likeelements throughout, FIG. 1 is perspective view of an apparatus forrecognizing images utilizing a computer 10 or other processor and animage capture source 12 that acquires a Query (Q) image 14 from a fieldof vision. A target image (T) 16 is input into the computer 10 and thequery image 14 is reviewed for the presence of the target image 16 as isfurther described herein. In searching the image data, the presentapparatus and method utilize a known method for processing image data asdescribed in U.S. Pat. No. 5,065,447. Further, the concept of usingfractal geometry to perform pattern recognition was described in U.S.Pat. No. 5,347,600. The subject matter of both patents being fullyincorporated herein by this reference. It should be noted, however, thatno claims of either of these patents relate to image recognition.Moreover, a prior method of using the results of the image processingmethods contained within the '447 and '600 patents was discussed inSloan, A., Image Recognition for Retrieving Database Contents, AdvancedImaging, May 1994, pp. 26-30. This prior method involves computing ameasure M(Q,T) of the similarity of two images Q and T using fractalgeometry. Given an image A, one can specify geometrical sub-regions of Aby describing their shape and location. For example, one such sub-regionmay be a 4×4 array of pixels whose upper left hand comer coincides withthe upper left hand comer of A.

Thus, as shown in FIG. 2, the query image 14 can be partitioned into aset of blocks, called domain blocks 20. The specific process to measurethe similarity of images is as follows:

(a) Choose a partition, U, of sub-regions of Q, to create “domain”blocks, i.e. the union of the set of nonintersecting domain blocksequals Q.

(b) Choose another set, V, of image sub-regions, called “range” blocksthat can either be sub-regions of Q or of T.

(c) Choose a set, W, of transformations S, where each such S maps someof the range blocks into domain blocks and where every domain block ismapped into, in this way.

(d) Choose an image metric ∥(A, B)∥ which measures the distance betweentwo image regions A and B.

(e) Initialize M(Q,T)=0.

-   -   For each domain block D of Q in any order,    -   f.1 Compute minimum{∥(D, S(R))∥: S is in W and R is in V} and        let S_(D) be the transformation, S and R_(D) be the range block        R for which the minimum occurs.    -   f.2 If R(D) is contained in T, then increment M(Q,T) by 1.

Then, normalize M(Q, T) by the dividing by the number of domain blocksin Q, so that M is always between 0 and 1.

However, the prior derivation of M has several problems with regard toits application in pattern recognition. For example, if N is the numberof Domain Blocks in Q, then M(Q,T) is always of the form k/N for somenumber k between 0 and N. If Q is an image with L×L pixel resolution andthe Domain Blocks are d×d in pixel resolution, then the number, N, ofDomain Blocks in Q is (L/d){circumflex over ( )}2. If d=4 for example,then: L N  32  64  256 4096 4096 ˜1 M 4 M ˜1 G

Consequently, N represents a theoretical limit on the number of imagesin a database for which the above method can distinguish among usingimages Q of resolution L×L, which limits the recognition accuracy of themethod. Any attempt to increase the recognition accuracy by increasingthe resolution by attempting to acquire, store, access and communicateand the higher resolution images leads to higher costs and longerprocessing and delay times. Thus, outside of some limited range, it isimpractical to deal with high-resolution pictures. Such is the reasonthat most digital cameras acquire images having pixel areas ranging fromhundreds of thousands to millions of pixels rather than billions ofpixels. Moreover, in many image recognition applications, imagesthemselves are not available at such resolution. For example, video of acrowd scene from a sporting event which has already taken place maycontain faces only at a small resolution, for example pictures at 20×20pixels. Third, the computation required in the calculation of M(Q,T)increases as the fourth power of L, if both Q and T have resolution L×L.This means the cost of a system implementing this method increasesrapidly, and after some resolution, becomes impractical. Fourth, largedatabases, such as satellite photographs, tend to have so many imagesthat a significant level of accuracy of M is necessary to provide avalid indication that a target image may be present or else theresources necessary to review the data are too great. However, the levelof accuracy of M is directly related to the theoretical limit of howmany images M can distinguish between. These reasons explain why theprior method was not adopted for use in practical applications.

One way of understanding the difference between constrained andunconstrained systems is through the scope of the problem eachrespectively addresses. For example, consider binary images of pixeldimension 32×32. There are 2{circumflex over ( )}(32×32) such images.Since any image difference could be of potential importance for someapplication, an optimally effective unconstrained system would need tobe able to distinguish, in principle, between any two such images. Onthe other hand, suppose the problem was to consider only 32×32 binaryimages where one pixel was non-zero. There are only 2{circumflex over( )}10, or approximately one thousand, such images. Therefore, aconstrained system designed to work with only 32×32 binary images withexactly one non-zero pixel would need, in principle, to distinguishbetween approximately one thousand images to be optimally effective.Some unconstrained systems are in fact be only able to distinguishbetween a relatively few images. For example, suppose the prior methoddiscussed above was applied to gray scale images with 256 intensitylevels and pixel resolution L×L. That method is apparently unconstrainedsince it can, in principle, deal with any such image. However, itmaximally distinguishes, in principle, at most L{circumflex over ( )}2images and this occurs for the choice d=1, even though there are((256){circumflex over ( )}(L{circumflex over ( )}2)) such images. Whenthe ability of an apparently unconstrained system to distinguish betweenimages is so severely reduced, it is termed “implicitly constrained,”rather than unconstrained. That is, even if there are no a priorilimitations based on an underlying image structure for an imagerecognition system, the prior art system can, in principle, onlydistinguish between at most H×V images, where H is the horizontalresolution and V is the vertical resolution. Through the determinationof the likelihood of similarity, the present invention does not havethese implicit constraints in image recognition and can adjust thelikelihood by accounting for any errors inherent in the block generationand comparison process as is further defined herein.

As shown in FIGS. 3-5, the present system performs transformations atleast on the target image 16, which can be separately entered into thesystem or obtained from the image data itself, to obtain possiblevariants on the target image 16 that may appear in the image data. InFIG. 3, the transformation of a Target (T) image occurs through a seriesof affine transformations of the target into a set of range blocks,which here is both downward scaling of the target image 16 and rotationof the image through 90° increments. Thus, the target image 16 is simplydownwardly scaled for target image 22, then rotated 90° right in targetimage 24, then 90° right again for target image 26, and finally 90°right again to obtain target image 28. Thus, a wide range of potentialvariants of the rotation of the target image is generated.

FIG. 4 is an illustration of other transformations of the target image16 through segmenting the target image 16. The target image 16 is thussegmented into a left vertical segment image 30 and a right verticalsegment image 32, and a top segment image 34 and a bottom segment image36. Accordingly, all of the target images 16,22,24,26,28,30,32,34,36 areaggregated into a range set 44 of target image range blocks as shown inFIG. 6. Further possible transformations of the target image 16 arepossible, such as blurring and color alteration as shown in FIG. 5. Withuse of segmentation, known correlations between specific portions of atarget image 16 or different target images can be determined to beconsistent with correlations between portions of image data otherwisedetermined to be similar. For example, the appearance of half of theimage, e.g. image 30, can infer that the target image is present.

In FIG. 5, the target image 16 is both blurred and has underlying coloraltered into a target image block 40. Through such transformation, bothblurred images that are likely to occur in the image data and changes incolor of the image, e.g. painting a boat a different color or a personaltering their appearance, can be searched. To one of skill in the art,it is clear that other types of transformations may be similarlyincluded or excluded, including but not limited to transformations ofthe spectral and/or variables by varied amounts comprised of (a)scaling, rotation and translation operations, (b) averaging variedgroups of pixels using different weighting functions such as Gaussianand motion blurring, (c) perspective transformations and (d) functionswhich can be locally approximated by such functions in a continuousmanner, either alone or in composition with each other. It can thus beseen that the search of variant target images can be selectivelyincluded or excluded from the range set 44 or can be generated insuccessive iterations of the recognition process as is further describedherein.

FIG. 6 illustrates the searching of a specific domain image block 42 inthe domain set 20 with a range set 44 of blocks generated in FIGS. 3 and4. Normally, for each such domain block such as block 42, each block ofthe range set 44 will be compared thereto to determine the likelihoodthat the target image 16 is present. Domain block 42 is selected toillustrated that, as it contains an upright image of the upper torso ofthe person of the target image 16, the block 42 will exhibit thestrongest similarity to target image 34, i.e. M(Q,T) approaching 1, andwill have some similarity to target images 16 and 22, i.e. M(Q,T) lessthan 1 but non-trivial.

One difficulty with the prior art method described above is that theM(Q,T) function reflects the extent to which that method does not havethe property: M(Q,T)=1 if and only if Q=T. In the prior art method,M(Q,T) was always between 0 and 1, and in certain cases M(Q,Q)=1, but inother cases M(Q,Q)<1. Since the prior art method intends that largervalues of M(Q,T) means greater similarity between Q and T, one method ofattempting to increase the value of M(Q,Q) was by spatially scaling theimage comprising the second argument of M by a factor equal to thereciprocal of the spatial scaling factor in the transformations S in W.The general class of transformation included spectral scaling as well,i.e. the processing of the images into domain blocks, and for thisreason, it was still possible for M(Q,Q)<1. To increase accuracy andlessen this scaling problem, the present system includes replacing eachset T by the result of applying the inverse of each S occurring in thecalculation of M(Q,T) to T prior to using it in this calculation.

The present system corrects the theoretical and practical limits of theprior method to distinguish between large numbers of small imagesthrough one or more error checking steps while compare domain blockswith range blocks. In the following notation, define V(T) to be thoserange blocks which are in T and V(Q) to be those range blocks which arein Q. For each domain image block, D in Q, the error in representing thedomain image blocks by S(R) is computed for all predeterminedtransformations S and predetermined range image blocks R. Here S(R) istherefore the result of applying the transformation S to the range imageblock, R. This error is defined by ∥(D, S(R))∥ and denoted byERROR(D,S,R).

The process for error correction is then:

(f′) For each Domain Block D of Q, in any order

f′.1 Compute ERROR(D,Q)

=min{ERROR(D, S, R): S in W, R in V(Q)}

f′.2 Compute ERROR(D,T)

=min{ERROR(D, S, R): S in W, R in V(T)}

f′.3 If ERROR(D,Q)>ERROR(D,T), then

{increment M(Q,T) by

(ERROR(D,Q)−ERROR(D,T))/ERROR(D,Q)}

If M(Q,T)=1, then each domain image block of Q is locally atransformation of some range image block of T. Conversely, to the priorunconstrained method, M(Q,T)=1 implies that Q=T locally meaning thateach domain image block of Q equals a range image block of T. For manypractical collections of query and target images, it turns out thatM(Q,T) =1 if and only if Q=T.

This error correction methodology is best used on certaintransformations; for example, if the transformations include rotationsby 90° clockwise as shown in FIG. 3, then M(Q,T)=1 implies that eachdomain block of Q locally equals a domain image block of T, or a 90°clockwise rotation of a domain image block of T. In many searches, theorientation of an object is not material to its classification oridentification, and this aspect of the present invention permitsidentification or classification independent of orientation. This use oftransformations of the target image makes no assumptions of underlyingstructure, and consequently, recognition performance on imagery of poorquality degrades smoothly over a wider range of image variations incontrast to constrained or implicitly constrained methods.

FIG. 7 is a flowchart illustrating the process of a basic embodiment ofimage recognition of one or more target images within digitized imagedata. Image data is input, likely although not necessarily containing aquery image (Q), as shown in step 60, and a database of target images(T) is determined either from an input target image 16, or a portion ofthe image data containing an image desired to be located within otherportions of the target data can be used to generated the target image.Both the image data and target images can be received from one or moresources including one or more databases, local or remote to theprocessor, computer memory in any format, computer storage devices suchas hard disk, floppy disk, CD-ROM, and DVD, one or more live digitalvideo sensors such as a video camera, CCD, infrared image sensor andsatellite image sensor, digital still camera, scanned photographic film,slides, developed pictures, digitized video from analog sources whetherstored or live in any format, digital and digitized medical images suchas x-rays, sonograms, cat-scans, magnetic resonance images, and digitaland digitized military images such a radar, sonar, projectile trackingsensors and night-vision sensors.

The input image data is then preprocessed to generate possible variantsin the domain set, as shown at step 62. This step of preprocessing canbe done to the image data (Q), the target image (T), or both. Thepreprocessing can include, without limitation: substantially affinetransformations, scaling images to pre-determined size, cropping imagesto pre-determined size, histogram equalization, changing the perspectiveof images (for example taking overhead surveillance photographs obtainedat various oblique angles and processing to estimate or approximatephotographs taken at differing angles) and such other techniques so asto increase uniformity of the images relative to subsequentpre-preprocessing, scaling and cropping images to possibly differentsizes depending on image content. An example is segmenting faces fromimages of heads, or heads and torsos, or full bodies as shown in FIG. 4,and then, for each image, choosing one scaling factor each of thehorizontal and vertical dimensions so that each segmented face has thesame horizontal and vertical resolution, or same area by maintaining orrelaxing constraints on aspect ratios.

The preprocessing can further include decreasing the image data for thepurpose of improving utility and or speed including but not limited to(i) marking certain image areas to ignore, with examples being commonsmooth regions, common color regions such as hair on head images orwater in images of ships, and such marking may be interactivelyperformed by an operator of the system, or creator or editor of theimage, or automatically by a computer program or other image processingsystem, and (ii) marking certain images to ignore such as successivevideo frames where little or no change occurs. The preprocessing canlikewise include increasing the image data for the purpose of improvingutility, for example, computing variations on the received image data torepresent transformed versions of the received image data, where suchtransformations may include but are not limited to (i) rotations bypre-determined angles, scaling by pre-determined factors, (ii) croppingat predetermined intervals, (iii) blurring to simulate images which areout-of-focus by pre-determined amounts, (iv) blurring to simulate imageswhich were obtained by relative motion and acceleration of camera ansubject, (v) lighting transformation to simulate images created indoorsby different types, numbers and positions of light sources, (vi)lighting transformations to simulate images created outdoors atdifferent times of day, at different times of year and under differentweather conditions, (vii) perspective transformations to simulate imagesobtained by cameras at different orientations to the subject, and (viii)lens transformations to simulate images produced by one or more lensesfrom images representing images produced by other lens(es), such as theimage produced by a fish-eye lens.

The process then computes a comparison measure between Q and each T, orCompare (Q,T), by first obtaining each T in the database, as shown atstep 64, and for each T, T is input as shown at step 66, and thenpreprocessed to generated a possible range of variants as discussedabove, as shown at step 68, and then a comparison is made for that T andits variants to Q, as shown at step 70. In one embodiment, each queryimage Q and target image T are gray scale images with pixel valuesranging from 0 to 255 so that each pixel may be represented by 1 byte ofdata. If the pixel dimensions of Q are not a multiple of 4, then either1, 2, or 3 rows and/or columns are cropped in the preprocessing step sothat the resulting image has horizontal and vertical pixel resolutionswhich are multiples of 4. The domain blocks are 4×4 squares of pixels.The domain blocks are then uniquely specified by further requiring theyform a partition of Q and so are non-overlapping. Each T is pixeldoubled in its pre-processing step so that its new dimensions are twicethe original dimensions. The range blocks are taken to 8×8 squares ofpixels where the upper left hand corner of each square has pixelcoordinates that are a multiple of 2. Next, new range blocks are createdby rotating existing range blocks by 90, 180, 270 degrees clockwise andadded to the original collection of range blocks to form an expanded setof range blocks. In the preferred embodiment, the distance between twosquare blocks, A and B, of pixels having pixel dimension 4×4 is definedby ∥A−B∥=square root of sum{Square[(A(i,j)−B(i,j))]: 0<i,j<5}, wherepixel in ith row and jth column of A and B is A(i,j) and B(i,j),respectively.

This embodiment makes use of a transformation which takes 8×8 squares ofpixels into 4×4 squares of pixels by averaging as follows: For anysquare array of pixel data, C, with pixel dimension 8×8 , define AVG(C)to be the 4×4 square array of pixel data with pixel value in the ith rowand jth column=(1/4)(C(2i,2j)+C(2i+1,2j)+C(2i,2j+1)+C(2i+1,2j+1)), where0<i,j.<5.

This embodiment also employs additional transformations defined asfollows. For any square array of pixel data, C, having pixel valueC(i,j) in the ith row and jth column, and any real numbers p and p′, thetransformation G(p,p′) from square arrays of pixel data to square arraysof pixel data, by specifying that the pixel in the ith row and jthcolumn of G(p,p′) applied to C, is p*C(i,j)+p′, it being understood thatif the resulting pixel value is below 0 it is reset to 0 and if it isabove 255, it is reset to 255. This resulting pixel array is denoted byG(p,p′,C). The set, W, of transformations, S, is applied to each rangeblock, R, whether in Q or T, where S(R)=AVG(G(.75,p,R)), for p anyinteger satisfying −256<p<256.

Thus in the preferred embodiment, for each domain block D in Q, and foreach range block R in T, are comparable and they are compared asfollows. For each domain block D in Q and range Block, R, and S in W,compute ERROR(D,S,R)=∥D−S(R)∥. If the process has reached not the lastrange block, then choose next R Once all range blocks have been soprocessed, the similarity S(Q,T) of Q and T can be computed.

In the preferred embodiment, the similarity is defined by means ofdescriptor are defined by:

DES(D,Q)

=min{ERROR(D, S, R): S in W, Range Block R in Q}; and

DES(D,T)

=min{ERROR(D, S, R): S in W, Range Block R in T}. Where it is understoodthat in the preferred embodiment T has been pixel doubled and imagespreprocessed and range blocks extended as described above. Then

initialize S(Q,T)=0.

for each domain block D in Q, in any order,

If ERROR(D,Q)>ERROR(D,T), then

{increment S(Q,T) by

(ERROR(D,Q)−ERROR(D,T))/ERROR(D,Q)}

Continue until all domain blocks D in Q, have been so processed. S(Q,T)=min {ERROR(D, S, R): S in W, Range Block R in pixel duplicated T}.

After the database of T has been iterated through, the classificationdata is generated based on {S(Q,T): T in Database}, as shown in step 72.Then a determination is made as to whether iteration of the recognitionprocess is necessary, as shown at decision 74. The determination ofiteration can be made based upon any criteria that would indicate thatthe first iteration was unsatisfactory, such as too many or too fewdomain blocks being indicated as likely matches. Thus, if iteration isindicated at decision 74, then the process return to step 64 and cangenerate a new database of T, changing one or more of the input,pre-processing and/or block generating parameters, and then againgenerate classification data based upon the new calculation at step 72.If an iteration is not indicated at decision 74, the generation of thecomparison of {Q,T} based upon {S(Q,T): T in Database} is performed asshown at step 76, and the likelihood of matched based upon thecomparisons of {Q,T} are output as shown at step 78. The process ofrecognizing images then terminates. In one embodiment, the step ofiteration is omitted so as to optimize the process for speed. In anotherembodiment, images T in the database are reordered whereby the firstimage is the one with the highest S(Q,T), the second image is the onewith the second highest S(Q,T) and in general the nth image is the onewhose score is the nth highest. In the case of tie scores, the relativeranking of images having a given score is chosen randomly. The mostlylikely image T to be similar to Q is then the first one. The next mostlikely image to be similar to Q is then the second one, and so on andthese likelihoods are output at 78.

The likelihoods output at 78 of the preferred embodiment solve thepre-screening problem. In this problem there is an image analyst whoneeds to review large numbers of images T to find one similar to Q, butwho only has time to review a small number of them, e.g. 1%. In theabsence of additional information or process, the analysts simplychooses 1% of the images to review at random. If the analyst finds onethat is similar to Q, then he is successful. However, if he doesn't finda T which is similar to Q, then he has no information about whetherthere is an image in the 99% of the imagery which he did not look atwhich in fact is similar to Q. Using the likelihoods output at 78, theanalyst would look at first 1% of the T's which have the highest S(Q,T)scores. Even if the analyst does not find an image which is similar toQ, the analyst can have confidence that the other unreviewed images areeven less likely to be similar to Q.

FIG. 8A is a flowchart of an enhanced embodiment of the process of imagerecognition of one or more target images within digitized image data.The image data is received in an image data stream as shown at step 80,and the image data should include one or more visual images that can besearched. It should be noted that the image data may contain no actualimages whatsoever and thus, the process can occur and generate M(Q,T)=0,but it is preferred that process only occur when searchable images arepresent due to overhead necessary to engage in the recognition process.The data stream is then digitized into image data as shown at step 82,and then the image data is preprocessed as described above, as shown atstep 84. A set of domain blocks (20 in FIG. 2) is then generated fromthe image data as shown at step 86, which is also shown in FIG. 2.

The target image 16 is then received as shown in step 88 and at leastone or more substantially affine transformations are performed on thetarget image 16, as shown at step 90. One example of substantiallyaffine transformations is shown as performed to a target image in FIG.3. A set of range blocks are then generated as shown at step 92, and foreach domain block of the domain set, each domain block is compared withthe set of range blocks (Such as range set 44 in FIG. 6), as shown atstep 94. A determination is then made during each comparison as towhether the blocks are comparable, as shown at decision 96.

If the blocks are not comparable at decision 96, then the processforwards to determine if the last domain block has been reached atdecision 106, which is more fully described below. If the blocks arecomparable at decision 96, then a second block is generated for thedomain block, as shown at step 98, which is for determining any erroroccurring in M(Q,Q). Then the second block is compared with the firstdomain block as shown at step 100, and a determination is made as towhether the first match between the domain block and the range block(M(Q,T)) is greater than the first and second domain block (M(Q,Q)), asshown at decision 102. If the level of matching is greater at decision102, the classification level is stored noting the greater matching, asshown at step 104. Thereafter, or if there was not a greater level ofmatching at decision 102, a determination is then made as to whether thelast domain block has been reached as shown at decision 106.

If the last domain block has not been reached at decision 106, then theprocess iterates to fetch the next domain block at step 94. It should benoted that the order of block comparison can be done in many differentorders. For example, all range blocks from the range set 44 can becompared before the process iterates to the next domain block. Oralternately, many comparisons involving the same blocks can occur duringthe single iteration, and an average score can be generated for thedomain block. If the last domain block has been reached at decision 106,then classification data is generated based upon at least the two levelsof matching, as shown at step 108, and then a determination is made asto whether iteration of the matching process is necessary, as shown atdecision 110. If iteration is necessary at decision 110, then theprocess iterates to step 84 and begins to preprocess the image data onceagain. Otherwise, if iteration is not necessary at decision 110, thenthe likelihood of a match is output to indicate which domain blocks arelikely matches for the target images, as shown at step 112. Then theimage recognition process is terminated.

The classification data and likelihood of similarity can varycontinuously with altered parameters during the transformation of theimage data, the target image, or both. An example is further blurring ofthe target image, as shown in FIG. 5, until the classification datareaches a predetermined threshold. While it is preferred that theclassification data indicates maximum similarity only when one image islocally an affine transformation of the other, the level of matchingshould be a reliable indicator that a match is possible such that humanreview of at least the specific domain block is needed.

It can thus be seen that the present invention provides a method forrecognizing one or more images within digitized image data including thesteps of digitizing image data (step 82); generating a set of domainblocks from the image data (step 86), where each domain blockrepresenting a discrete portion the image data as shown in FIG. 2;generating a set of range blocks from a predetermined one or more targetimages that are desired located within the image data (FIG. 6), therange blocks corresponding to discrete portions of the one or moretarget images, and the range blocks are transformed by one or moresubstantially affine transformations with predetermined coefficients;comparing each domain block with one or more of the range blocks (step94); while comparing, generating classification data based upon ameasurement of whether matching is achieved when a range blockrepresenting at least a portion of the one or more target images issimilar to a domain block, and at least a measurement of whether bettermatching is achieved when a range block is chosen from image datarepresenting the image which is the source of the domain block (decision102); and determining the likelihood of at least a specific portion ofone or more target images being similar to specific portions of theimage data based upon the classification data. The method can includethe step of preprocessing the image data in a predetermined manner, suchas scaling images to a pre-determined size, segmenting the image data,or marking certain image areas to be ignored during the comparison ofeach domain block with one or more range blocks. The method can alsoinclude iterating the steps of image recognition based upon theclassification data, as shown by decision 110.

The step of generating a set of range blocks by one or moresubstantially affine transformations can be generating a set of rangeblocks by at least spectral translation, spatial translation, or one ormore rotations. The method can further include the step of correlatingimage data and target images with the classification data, and suchcorrelation can be between specific portions of a target image withcorrelations between portions of image data.

The step of generating classification data can be indicating maximumsimilarity only when the target image is locally an affinetransformation of the other at least one searchable image in the imagedata. Further, the step of generating a set of range blocks cangenerating the range blocks with different affine transformationsapplied to different range blocks based on pre-determined criteria,thereby creating additional range blocks as shown in the range set 44 ofFIG. 6.

The step of determining the likelihood of at least a specific portion ofone or more target images being similar to specific portions of theimage data can be determining the likelihood of similarity by using afunction of two variables with values between 0 and 1, wherein the firstvariable is a specific portion of a target image and the second variableis a specific portion of the image data. Moreover, the step ofgenerating a set of range blocks from a predetermined one or more targetimages can be generation of a set of range blocks from one or moretarget images within the image data itself.

In view of the method being executable on the computer platform of acomputing device such as computer 10, the present invention includes aprogram resident in a computer readable medium, where the programdirects a server or other computing device having a computer platform toperform the steps of the method. The computer readable medium can be thememory of the computer 10, or can be in a connective database. Further,the computer readable medium can be in a secondary storage media that isloadable onto a wireless device computer platform, such as a magneticdisk or tape, optical disk, hard disk, flash memory, or other storagemedia as is known in the art.

In the context of FIGS. 7-8B, the method may be implemented, forexample, by operating portion(s) of a network to execute a sequence ofmachine-readable instructions. The instructions can reside in varioustypes of signal-bearing or data storage primary, secondary, or tertiarymedia. The media may comprise, for example, RAM (not shown) accessibleby, or residing within, the components of the wireless network. Whethercontained in RAM, a diskette, or other secondary storage media, theinstructions may be stored on a variety of machine-readable data storagemedia, such as DASD storage (e.g., a conventional “hard drive” or a RAIDarray), magnetic tape, electronic read-only memory, flash memory cards,an optical storage device (e.g. CD-ROM, WORM, DVD, digital opticaltape), or other suitable data storage media including digital and analogtransmission media

While there has been shown a preferred embodiment of the presentinvention, it is to be understood that certain changes may be made inthe forms and arrangement of the elements and steps of the methodwithout departing from the underlying spirit and scope of the inventionas is set forth in the claims.

1-32. canceled
 33. An apparatus for unconstrained recognition of one ormore images within digitized image data, comprising at least oneprocessor that: receives image data in a digital format; generates a setof domain blocks from the image data, each domain block representing adiscrete portion the image data; generates a set of range blocks from apredetermined one or more target images that are desired located withinthe image data, the range blocks corresponding to discrete portions ofthe one or more target images, and the range blocks are transformed byone or more substantially affine transformations with predeterminedcoefficients; compares each domain block with one or more of the rangeblocks; while comparing, generates classification data based upon ameasurement of whether the comparison is closer when one or more rangeblocks represent at least a portion of the one or more target images, orthe comparison is closer when one or more range blocks represent atleast a portion of the image data representing the image which is thesource of the domain block, the classification data including anadjustment for error; determines a similarity indication of at least aspecific portion of one or more target images being similar to specificportions of the image data based upon the classification data; andprovides the similarity indication to a third party application.
 34. Theapparatus of claim 33, wherein the third party application is adatabase.
 35. The apparatus of claim 33, wherein the third partyapplication is a repository accessible through a network.
 36. Theapparatus of claim 35, wherein the network is the Internet.
 37. Theapparatus of claim 33, wherein the third party application is a searchengine.
 38. The apparatus of claim 33, wherein the target images areimages of known terrorists.
 39. The apparatus of claim 33, wherein thetarget images are images of known criminals.
 40. The apparatus of claim33, wherein the target images are faces of known individuals.
 41. Theapparatus of claim 33, wherein the target images are images oftransportation vehicles.
 42. The apparatus of claim 41, wherein thetransportation vehicles are aircrafts.
 43. The apparatus of claim 41,wherein the transportation vehicles are ships.
 44. The apparatus ofclaim 33, wherein the target images are satellite images.
 45. Theapparatus of claim 33, wherein the target images are reconnaissanceimages.
 46. A method for unconstrained recognition of one or more imageswithin digitized image data, comprising the steps of: digitizing imagedata; generating a set of domain blocks from the image data, each domainblock representing a discrete portion the image data; generating a setof range blocks from a predetermined one or more target images that aredesired located within the image data, the range blocks corresponding todiscrete portions of the one or more target images, and the range blocksare transformed by one or more substantially affine transformations withpredetermined coefficients; comparing each domain block with one or moreof the range blocks; while comparing, generating classification databased upon a measurement of whether matching is achieved when a rangeblock representing at least a portion of the one or more target imagesis similar to a domain block, and at least a measurement of whetherbetter matching is achieved when a range block is chosen from image datarepresenting the image which is the source of the domain block, theclassification data including an adjustment for error; determining asimilarity indication of at least a specific portion of one or moretarget images being similar to specific portions of the image data basedupon the classification data; and providing the similarity indication toa third party application.
 47. The apparatus of claim 46, wherein thethird party application is a database.
 48. The apparatus of claim 46,wherein the third party application is a repository accessible through anetwork.
 49. The apparatus of claim 48, wherein the network is theInternet.
 50. The apparatus of claim 46, wherein the third partyapplication is a search engine.
 51. The apparatus of claim 46, whereinthe target images are images of known terrorists.
 52. The apparatus ofclaim 46, wherein the target images are images of known criminals.